Accelerated Gradient Methods for Stochastic Optimization and Online Learning
نویسندگان
چکیده
Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., l1-regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems.
منابع مشابه
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as !1-norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the regularization structure in an online setting...
متن کاملAccelerating Stochastic Composition Optimization
Consider the stochastic composition optimization problem where the objective is a composition of two expected-value functions. We propose a new stochastic firstorder method, namely the accelerated stochastic compositional proximal gradient (ASC-PG) method, which updates based on queries to the sampling oracle using two different timescales. The ASC-PG is the first proximal gradient method for t...
متن کاملWithout-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization
Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In practice, however, sampling without replacement is very common, easier to implement in many cases, and often performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling, under various scenarios, f...
متن کاملProjection-Free Online Optimization with Stochastic Gradient: From Convexity to Submodularity
Online optimization has been a successful framework for solving large-scale problems under computational constraints and partial information. Current methods for online convex optimization require either a projection or exact gradient computation at each step, both of which can be prohibitively expensive for large-scale applications. At the same time, there is a growing trend of non-convex opti...
متن کاملTheory of Convex Optimization for Machine Learning
This monograph presents the main mathematical ideas in convex optimization. Starting from the fundamental theory of black-box optimization, the material progresses towards recent advances in structural optimization and stochastic optimization. Our presentation of black-box optimization, strongly influenced by the seminal book of Nesterov, includes the analysis of the Ellipsoid Method, as well a...
متن کامل